Entity data attribution using disparate data sets

ABSTRACT

Systems and methods for using disparate data sets to attribute data to an entity are disclosed. Disparate data sets can be obtained from a variety of data sources. The disclosed systems and methods can obtain a first and second data set. Trajectories can represent multiple data records in a data set associated with an entity. Trajectories from the obtained data sets can be used to associate data stored among the various data sets. The association can be based on the agreement between the trajectories. The associated data records can further be used to associate the entities related to the associated data records.

CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of, and claims priority to U.S.patent application Ser. No. 17/159,510, which was filed on Jan. 27,2021, which application is a continuation of, and claims priority toU.S. patent application Ser. No. 16/209,763 (now U.S. Pat. No.10,942,935), which was filed on Dec. 4, 2018, which application is acontinuation of, and claims priority to U.S. patent application Ser. No.15/209,544 (now U.S. Pat. No. 10,223,429), which was filed on Jul. 13,2016, which claims priority to U.S. Provisional Patent Application No.62/261,744, which was filed on Dec. 1, 2015, and the disclosures ofwhich are expressly incorporated herein by reference in theirentireties.

BACKGROUND

Vast amounts of data are readily available to analysts today, on the onehand allowing them to perform more complicated and detailed dataanalyses than ever but on the other hand making it more difficult tocompare the data to other data sets. Different data sets can containinformation relating to the same entities without any effective way oflinking the data. The ability to analyze related data stored in multipledata sets can provide great opportunity for better understanding thedata as a whole. Analyzing these large and potentially disparatedatasets to resolve related data presents computational challenges. Theability to effectively and efficiently link data across disconnecteddata sets can provide valuable insights not discernible from theindividual data sets alone.

BRIEF DESCRIPTION OF THE DRAWINGS

Reference will now be made to the accompanying drawings showing exampleembodiments of the present application, and in which:

FIG. 1 illustrates, in block diagram form, an exemplary data fusionsystem for providing data analysis, consistent with embodiments of thepresent disclosure.

FIG. 2 is a block diagram of an exemplary system for analyzing disparatedata sets, consistent with embodiments of the present disclosure.

FIG. 3 is a block diagram of an exemplary computer system, consistentwith embodiments of the present disclosure.

FIG. 4 is a block diagram of an exemplary data structure accessed in theprocess of analyzing disparate data sets, consistent with theembodiments of the present disclosure.

FIG. 5 is a block diagram of an exemplary data structure accessed in theprocess of analyzing disparate data sets, consistent with theembodiments of the present disclosure.

FIG. 6 is a block diagram of an exemplary system for data attributionand analysis using disparate data sets, consistent with the embodimentsof the present disclosure.

FIG. 7 is a flowchart representing an exemplary process for dataattribution and analysis using disparate data sets, consistent withembodiments of the present disclosure.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Reference will now be made in detail to exemplary embodiments, theexamples of which are illustrated in the accompanying drawings. Wheneverpossible, the same reference numbers will be used throughout thedrawings to refer to the same or like parts.

Generally, embodiments of the present disclosure relate to analyzing andattributing data in disparate data sets. The data represented by thevarious data sets can include data regarding interactions betweenindividuals or entities. Large sets of interaction data may be filteredaccording to selectable criteria to provide, for example, informationassociated with the location and timing of specific interactions. Suchselectable criteria may include an address, geographic coordinates,times of interactions, time spent between interactions, demographics ofindividuals involved, types of entities or organizations involved, etc.Using attributes of the data in the disparate data sets, embodiments ofthe present disclosure can determine agreement among the data setsallowing for the determination of related data and entities despite noclear overlap in identification information among the various data sets.

In some embodiments, data can be stored in multiple, disparate datasets. For example data related to an interaction may be stored in onedatabase while information to the circumstances leading to thatinteraction may be stored in a separate database. Examples of data setscan include location information attached to social media activity,mobile device details, and past interactions. Although stored inseparate databases and data sets, this data may contain information thatallows for the attribution of data across those databases or data setsthat involve common entities or individuals.

In some embodiments, data from the data sets may be processed todetermine trajectories that represent data within a data set associatedwith particular interactions or information involving a singleindividual or entity. A trajectory can be a representation of actions,attributes, behaviors, and/or records associated with the same entity.When considering multiple interactions or records associated with anentity, trajectories can provide a mechanism for evaluating that entityacross those interactions, multiple data records, or multiple datapoints in a way that does not rely on specific fields in each individualrecord or attributes associated with each individual data point. Bycreating trajectories representing an entity, additional insight andinformation can be obtained that may not be available from analyzing theindividual records alone.

Moreover, trajectories can provide an additional data point forcomparing and contrasting two or more entities. For example, a data setmay include multiple interactions associated with a particularindividual. A trajectory associated with these interactions may revealthe individuals movement or behavior. Such a trajectory could becompared or contrasted with similar analysis of other data sets to findcommonality or agreement that may indicate that the data sets areassociated with the same individual. Trajectories can be created formultiple data sets and many different types of data. Trajectories fromdifferent sources that represent similar types of data can be comparedto determine if data within the data sets agree. For example, atrajectory for a data set representing online browsing history can becompared to trajectories for a data set involving interactions in anoffline context. Trajectories that match can indicate that the two datasets contain information related to the same individual.

In some embodiments the basis for the trajectories can vary based on thetype and nature of the data in the data set. For example, the number ofmatches among trajectories in different data sets needed to determine apositive association can be adjusted based on the characteristics ofeach data set and the amount of overlap between each data set.

In some embodiments, the analysis may involve location information. Whenutilizing location information, the trajectory analysis can includespecific data points, e.g. specific latitude and longitude coordinates,or can include all locations within a radius of a specific point.Depending on the precision of the underlying data sets, the trajectoryanalysis can consider varied levels of detail. Additionally, thelocation information can be generalized. For example, locationinformation could be a city, neighborhood, a state, a zip code, or othersimilar designation. The level of granularity can depend on the specificapplication.

Similarly, in some embodiments the analysis may involve date and timeinformation. When utilizing date and time information, the trajectoryanalysis can include specific dates and times or allow for ranges ofdates and times to account for differences in the data that still canrefer to a single interaction or event. Similar to location information,the precision used in the trajectory analysis can vary depending on theunderlying data. For example, dates and times may be granular to theminute or can be specified at a generic level, such as “a week ago.”More or less levels of granularity are possible and the specificrequirements of the application can dictate the appropriate levels ofgranularity.

Additional aspects of embodiments consistent with the present disclosureinclude implicit and explicit comparisons of data in the disparate datasets. For example, multiple data sets may include the same uniqueinformation allowing an explicit attribution of data in one data setwith data in another data set. In some embodiments, probabilisticanalysis can help determine confidence levels in the precision oftrajectory calculations and comparisons. Embodiments consistent with thepresent disclosure can allow for efficient analysis of large, seeminglyunrelated data sets providing accurate attribution of information in thedata sets to a common entity or individual. These attributions canprovide significant advantages to those consuming the data.

FIG. 1 illustrates, in block diagram form, an exemplary data fusionsystem 100 for providing data analysis, consistent with embodiments ofthe present disclosure. Among other things, data fusion system 100facilitates transformation of one or more data sources, such as datasources 130 (e.g., which can be data systems 210-250 shown in FIG. 2 anddescribed in more detail below) into an object model 160 whose semanticsare defined by an ontology 150. The transformation can be performed fora variety of reasons. For example, data can be imported from datasources 130 into a database 170 for persistently storing object model160. As another example, a data presentation component (not depicted)can transform input data from data sources 130 “on the fly” into objectmodel 160. The object model 160 can then be utilized, in conjunctionwith ontology 150, for analysis through graphs and/or other datavisualization techniques.

Data fusion system 100 comprises a definition component 110 and atranslation component 120, both implemented by one or more processors ofone or more computing devices or systems executing hardware and/orsoftware-based logic for providing various functionality and features ofthe present disclosure, as described herein. As will be appreciated fromthe present disclosure, data fusion system 100 can comprise fewer oradditional components that provide the various functionalities andfeatures described herein. Moreover, the number and arrangement of thecomponents of data fusion system 100 responsible for providing thevarious functionalities and features described herein can further varyfrom embodiment to embodiment.

Definition component 110 generates and/or modifies ontology 150 and aschema map 140. Exemplary embodiments for defining an ontology (such asontology 150) are described in U.S. Pat. No. 7,962,495 (the '495patent), issued on Jun. 14, 2011, the entire contents of which areexpressly incorporated herein by reference for all purposes. Consistentwith certain embodiments disclosed in the '495 patent, a dynamicontology may be used to create a database. To create a databaseontology, one or more object types may be defined, where each objecttype includes one or more properties. The attributes of object types orproperty types of the ontology can be edited or modified at any time.And, for each property type, at least one parser definition may becreated. The attributes of a parser definition can be edited or modifiedat any time.

In some embodiments, each property type is declared to be representativeof one or more object types. A property type is representative of anobject type when the property type is intuitively associated with theobject type. Alternatively, each property type has one or morecomponents and a base type. In some embodiments, a property type cancomprise a string, a date, a number, or a composite type consisting oftwo or more string, date, or number elements. Thus, property types areextensible and can represent complex data structures. Further, a parserdefinition can reference a component of a complex property type as aunit or token.

An example of a property having multiple components is an Addressproperty having a City component and a State component. An example ofraw input data is “Los Angeles, CA.” An example parser definitionspecifies an association of imported input data to object propertycomponents as follows: {CITY}, {STATE}→Address:State, Address:City. Insome embodiments, the association {CITY}, {STATE} is defined in a parserdefinition using regular expression symbology. The association {CITY},{STATE} indicates that a city string followed by a state string, andseparated by a comma, comprises valid input data for a property of typeAddress. In contrast, input data of “Los Angeles CA” would not be validfor the specified parser definition, but a user could create a secondparser definition that does match input data of “Los Angeles CA.” Thedefinition Address:City, Address:State specifies that matching inputdata values map to components named “City” and “State” of the Addressproperty. As a result, parsing the input data using the parserdefinition results in assigning the value “Los Angeles” to theAddress:City component of the Address property, and the value “CA” tothe Address:State component of the Address property.

According to some embodiments, schema map 140 can define how variouselements of schemas 135 for data sources 130 map to various elements ofontology 150. Definition component 110 receives, calculates, extracts,or otherwise identifies schemas 135 for data sources 130. Schemas 135define the structure of data sources 130; for example, the names andother characteristics of tables, files, columns, fields, properties, andso forth. Definition component 110 furthermore optionally identifiessample data 136 from data sources 130. Definition component 110 canfurther identify object type, relationship, and property definitionsfrom ontology 150, if any already exist. Definition component 110 canfurther identify pre-existing mappings from schema map 140, if suchmappings exist.

Based on the identified information, definition component 110 cangenerate a graphical user interface 115. Graphical user interface 115can be presented to users of a computing device via any suitable outputmechanism (e.g., a display screen, an image projection, etc.), and canfurther accept input from users of the computing device via any suitableinput mechanism (e.g., a keyboard, a mouse, a touch screen interface,etc.). Graphical user interface 115 features a visual workspace thatvisually depicts representations of the elements of ontology 150 forwhich mappings are defined in schema map 140.

In some embodiments, transformation component 120 can be invoked afterschema map 140 and ontology 150 have been defined or redefined.Transformation component 120 identifies schema map 140 and ontology 150.Transformation component 120 further reads data sources 130 andidentifies schemas 135 for data sources 130. For each element ofontology 150 described in schema map 140, transformation component 120iterates through some or all of the data items of data sources 130,generating elements of object model 160 in the manner specified byschema map 140. In some embodiments, transformation component 120 canstore a representation of each generated element of object model 160 ina database 170. In some embodiments, transformation component 120 isfurther configured to synchronize changes in object model 160 back todata sources 130.

Data sources 130 can be one or more sources of data, including, withoutlimitation, spreadsheet files, databases, email folders, documentcollections, media collections, contact directories, and so forth. Datasources 130 can include data structures stored persistently innon-volatile memory. Data sources 130 can also or alternatively includetemporary data structures generated from underlying data sources viadata extraction components, such as a result set returned from adatabase server executing a database query.

Schema map 140, ontology 150, and schemas 135 can be stored in anysuitable structures, such as XML files, database tables, and so forth.In some embodiments, ontology 150 is maintained persistently. Schema map140 may or may not be maintained persistently, depending on whether thetransformation process is perpetual or a one-time event. Schemas 135need not be maintained in persistent memory, but can be cached foroptimization.

Object model 160 comprises collections of elements such as typedobjects, properties, and relationships. The collections can bestructured in any suitable manner. In some embodiments, a database 170stores the elements of object model 160, or representations thereof.Alternatively, the elements of object model 160 are stored withindatabase 170 in a different underlying format, such as in a series ofobject, property, and relationship tables in a relational database.

According to some embodiments, the functionalities, techniques, andcomponents described herein are implemented by one or morespecial-purpose computing devices. The special-purpose computing devicescan be hard-wired to perform the techniques, or can include digitalelectronic devices such as one or more application-specific integratedcircuits (ASICs) or field programmable gate arrays (FPGAs) that arepersistently programmed to perform the techniques, or can include one ormore general purpose hardware processors programmed to perform thetechniques pursuant to program instructions in firmware, memory, otherstorage, or a combination. Such special-purpose computing devices canalso combine custom hard-wired logic, ASICs, or FPGAs with customprogramming to accomplish the techniques. The special-purpose computingdevices can be desktop computer systems, portable computer systems,handheld devices, networking devices, or any other device thatincorporates hard-wired and/or program logic to implement thetechniques.

Throughout this disclosure, reference will be made to an entity such as,for example, a provisioning entity and a consuming entity. It will beunderstood that a provisioning entity can include, for example, amerchant, a retail provisioning entity or the like, and a consumingentity can include, for example, a consumer user buying products orservices from a provisioning entity. It will be understood that aconsuming entity can represent either individual persons or canrepresent a group of persons (e.g., a group of persons living under oneroof as part of a family). In some embodiments, a consuming entity canbe associated with a credit card number of an individual or a creditcard number for an entire family sharing one credit card. It will alsobe understood that a provisioning entity can represent either the entityitself or individual persons involved with the entity.

In embodiments consistent with the present disclosure, data fusionsystem 100 can provide processed data from disparate data sources to ananalysis system. For example, data stored in different data sets may usea variety of forms for location information. One data set can useaddress information while another data set can use latitude andlongitude. Moreover, different data sets that use address informationmay store an address as one single entry or divided into logicalcomponents such as street number, street name, city, state, and zipcode. Data fusion system 100 can provide a mechanism to control dataintake and process the data sets to store or provide representations ofthe data sets that conform to a consistent object model. This can allowan analysis engine to process data in a consistent manner withoutneeding to account for differences in the way data are stored. Forexample, an analysis engine consistent with embodiments of the presentdisclosure can calculate trajectories for multiple data sets havinglocation information without needing to account for differences in howthe locations are stored in the original data. In these embodiments,data fusion system 100 provides a consistent object model to theanalysis engine to allow for data simplified trajectory calculations andcomparisons.

FIG. 2 is a block diagram of an exemplary system 200 for acquiring andcomparing data from disparate data sets, consistent with disclosedembodiments. In some embodiments, system 200 can include analysis engine210, one or more financial services systems 220, one or more geographicdata systems 230, one or more provisioning entity management systems240, and one or more consuming entity data systems 250. The componentsand arrangement of the components included in system 200 can varydepending on the embodiment. For example, analysis engine 210 caninteract with geographic data systems 230 and financial services systems220 without using data from the other components. Thus, system 200 caninclude fewer or additional components that perform or assist in theanalysis of data sets

One or more components of system 200 can include computing systemsconfigured to provide different types of data. As further describedherein, components of system 200 can include one or more computingdevices (e.g., computer(s), server(s), etc.), memory storing data and/orsoftware instructions (e.g., database(s), memory devices, etc.), andother appropriate computing components. In some embodiments, the one ormore computing devices are configured to execute software or a set ofprogrammable instructions stored on one or more memory devices toperform one or more operations, consistent with the disclosedembodiments. Components of system 200 can be configured to communicatewith one or more other components of system 200, including analysisengine 210, one or more financial services systems 220, one or moregeographic data systems 230, one or more provisioning entity managementsystems 240, and one or more consumer entity data systems 250. Incertain aspects, users can operate one or more components of system 200.The one or more users can be employees of, or associated with, theentity corresponding to the respective component(s) (e.g., someoneauthorized to use the underlying computing systems or otherwise act onbehalf of the entity).

Analysis engine 210 can be a computing system configured to store,organize and process data sets. For example, analysis engine 210 can bea computer system configured to execute software or a set ofprogrammable instructions that collect or receive financial interactiondata, consumer data, and provisioning entity data and process the datato determine related information. Analysis engine 210 can be configured,in some embodiments, to utilize, include, or be a data fusion system 100(see, e.g., FIG. 1 ) to transform data from various data sources (suchas, financial services systems 220, geographic data systems 230,provisioning entity management systems 240, and consuming entity datasystems 250) for processing. In some embodiments, analysis engine 210can be implemented using a computer system 300, as shown in FIG. 3 anddescribed below.

Analysis engine 210 can include one or more computing devices (e.g.,server(s)), memory storing data and/or software instructions (e.g.,database(s), memory devices, etc.) and other known computing components.According to some embodiments, analysis engine 210 can include one ormore networked computers that execute processing in parallel or use adistributed computing architecture. Analysis engine 210 can beconfigured to communicate with one or more components of system 200, andit can be configured to consume data sets associated with those systems.It is appreciated that the systems shown in FIG. 2 are illustrative, andother systems may provide data to analysis engine 210.

Financial services system 220 can be a computing system associated witha financial service provider, such as a bank, credit card issuer, creditbureau, credit agency, or other entity that generates, provides,manages, and/or maintains financial service accounts for one or moreusers. Financial services system 220 can generate, maintain, store,provide, and/or process financial data associated with one or morefinancial service accounts. Financial data can include, for example,financial service account data, such as financial service accountidentification data, account balance, available credit, existing fees,reward points, user profile information, and financial service accountinteraction data, such as interaction dates, interaction amounts,interaction types, and location of interaction. In some embodiments,each interaction of financial data can include several categories ofinformation associated with the interaction. For example, eachinteraction can include categories such as number category; consumingentity identification category; consuming entity location category;provisioning entity identification category; provisioning entitylocation category; type of provisioning entity category; interactionamount category; and time of interaction category, as described in FIG.4 . It will be appreciated that financial data can comprise eitheradditional or fewer categories than the exemplary categories listedabove. Financial services system 220 can include infrastructure andcomponents that are configured to generate and/or provide financialservice accounts such as credit card accounts, checking accounts,savings account, debit card accounts, loyalty or reward programs, linesof credit, and the like.

Geographic data systems 230 can include one or more computing devicesconfigured to provide geographic data to other computing systems insystem 200 such as analysis engine 210. For example, geographic datasystems 230 can provide geodetic coordinates when provided with a streetaddress or vice-versa. In some embodiments, geographic data systems 230expose an application programming interface (API) including one or moremethods or functions that can be called remotely over a network, such asnetwork 270. In some embodiments, geographic data systems can provideinformation concerning a radius around a specific point. For example,analysis engine 210 can provide two addresses, and geographic datasystems 230 can provide, in response, whether or not one address iswithin a threshold distance of the other address.

Provisioning entity management systems 240 can include one or moreprocessors configured to execute software instructions stored in memory.Provisioning entity management systems 240 can include software or a setof programmable instructions that when executed by a processor performknown Internet-related communication. For example, provisioning entitymanagement systems 240 can provide and execute software or a set ofinstructions that provides interfaces to retrieve data stored inprovisioning entity management systems 240. The disclosed embodimentsare not limited to any particular configuration of provisioning entitymanagement systems 240.

Provisioning entity management systems 240 can be one or more computingsystems associated with a provisioning entity that provides products(e.g., goods and/or services), such as a restaurant (e.g., OutbackSteakhouse®, Burger King®, etc.), retailer (e.g., Amazon.com®, Target®,etc.), grocery store, mall, shopping center, service provider (e.g.,utility company, insurance company, financial service provider,automobile repair services, movie theater, etc.), non-profitorganization (ACLU™, AARP®, etc.) or any other type of entity thatprovides goods, services, and/or information that consuming entities(i.e., end users or other business entities) can purchase, consume, use,etc. For ease of discussion, the exemplary embodiments presented hereinthat discuss provisioning entities relate to entities whose interactionsinvolve goods and services. Provisioning entity management systems 240,however, are not limited to systems associated with retail provisioningentities that conduct business in any particular industry or field.

Provisioning entity management systems 240 can be associated withcomputer systems installed and used at a brick and mortar provisioningentity locations where a consumer can physically visit and purchasegoods and services. Such locations can include computing devices thatperform financial service interactions with consumers (e.g., Point ofSale (POS) terminal(s), kiosks, etc.). Provisioning entity managementsystems 240 can also include back and/or front-end computing componentsthat store data and execute software or a set of instructions to performoperations consistent with disclosed embodiments, such as computers thatare operated by employees of the provisioning entity (e.g., back officesystems, etc.). Provisioning entity management systems 240 can also beassociated with a provisioning entity that provides goods and/or servicevia known online or e-commerce types of solutions. For example, such aprovisioning entity can sell products via a website using known onlineor e-commerce systems and solutions to market, sell, and process onlineinteractions. Provisioning entity management systems 240 can include oneor more servers that are configured to execute stored software or a setof instructions to perform operations associated with a provisioningentity, including one or more processes associated with processingpurchase interactions, generating interaction data, generating productdata (e.g., SKU data) relating to purchase interactions, for example.

Provisioning entity management systems 240 can be one or more computingdevices configured to provide provisioning entity analysis and data toanalysis engine 210. For example, provisioning entity management systems240 can be a desktop computer, a laptop, a server, a mobile device(e.g., tablet, smart phone, etc.), or any other type of computing deviceconfigured to provide data access to analysis engine 210 for datarelated to the provisioning entity management systems 240. For example,provisioning entity management systems 240 can generate, maintain,store, provide, and/or process financial data associated with one ormore merchants or provisioning entities. Provisioning entity data caninclude, inter alia, customer interaction data that consists of, forexample, store visits, individual transaction information, credit cardusage, purchase history information, loyalty accounts, service requests,customer service records, transaction locations, and customerinformation.

Consuming entity data systems 250 can include one or more computingdevices configured to provide demographic or other data regardingconsumers. For example, consuming entity data systems 250 can provideinformation regarding the name, address, gender, income level, age,email address, or other information about consumers. Consuming entitydata systems 250 can include public computing systems such as computingsystems affiliated with the U.S. Bureau of the Census, the U.S. Bureauof Labor Statistics, or FedStats, or it can include private computingsystems such as computing systems affiliated with financialinstitutions, credit bureaus, social media sites, marketing services,advertising agencies, or some other organization that collects andprovides demographic data or data about individual consumers. In someembodiments consumer entity data systems 250 can include advertisinginformation related to individual consumers such as, ad views, clicks,ad impressions, ad details, or other advertisement related information.In some embodiments, consumer entity data systems may include webbrowsing history (e.g., browsing data provided by Apple Safari,Microsoft Internet Explorer, Google Chrome, Mozilla Firefox, or otherweb browser), social network interactions (e.g., from social networkingproviders like, among others, Facebook, LinkedIn, and Instagram), orother available online behavior related to a consumer or group ofconsumers.

Network 270 can be any type of network or combination of networksconfigured to provide electronic communications between components ofsystem 200. For example, network 270 can be any type of network(including infrastructure) that provides communications, exchangesinformation, and/or facilitates the exchange of information, such as theInternet, a Local Area Network, or other suitable connection(s) thatenables the sending and receiving of information between the componentsof system 200. Network 270 may also comprise any combination of wiredand wireless networks. In other embodiments, one or more components ofsystem 200 can communicate directly through a dedicated communicationlink(s), such as links between analysis engine 210, financial servicessystem 220, geographic data systems 230, provisioning entity managementsystems 240, and consuming entity data systems 250.

As noted above, analysis engine 210 can include a data fusion system(e.g., data fusion system 100) for organizing data received from one ormore of the components of system 200. Analysis engine 210 can queryother components of system 200 and consume information provided by theother components of system 200. Analysis engine can attribute data fromone system with data from another system. For example, in someembodiments analysis engine 210 attributes transactions and purchasehistory from financial services systems 220 with online advertisinginformation from consuming entity data systems 250 to correlateadvertisements that lead to purchases in brick-and-mortar stores.Moreover, analysis can combine information stored in multiple data setsassociated with each component and/or multiple data sets from differentcomponents of system 200.

FIG. 3 is a block diagram of an exemplary computer system 300,consistent with embodiments of the present disclosure. The components ofsystem 200 such as provisioning entity data systems 210, financialservice systems 220, geographic data systems 230, provisioning entitymanagement systems 240, and consuming entity data systems 250 mayinclude the architecture based on or similar to that of computer system300.

As illustrated in FIG. 3 , computer system 300 includes a bus 302 orother communication mechanism for communicating information, and one ormore hardware processors 304 (denoted as processor 304 for purposes ofsimplicity) coupled with bus 302 for processing information. Hardwareprocessor 304 can be, for example, one or more general-purposemicroprocessors or it can be a reduced instruction set of one or moremicroprocessors.

Computer system 300 also includes a main memory 306, such as a randomaccess memory (RAM) or other dynamic storage device, coupled to bus 302for storing information and instructions to be executed by processor304. Main memory 306 also can be used for storing temporary variables orother intermediate information during execution of instructions to beexecuted by processor 304. Such instructions, after being stored innon-transitory storage media accessible to processor 304, rendercomputer system 300 into a special-purpose machine that is customized toperform the operations specified in the instructions.

Computer system 300 further includes a read only memory (ROM) 308 orother static storage device coupled to bus 302 for storing staticinformation and instructions for processor 304. A storage device 310,such as a magnetic disk, optical disk, or USB thumb drive (Flash drive),etc. is provided and coupled to bus 302 for storing information andinstructions.

Computer system 300 can be coupled via bus 302 to a display 312, such asa cathode ray tube (CRT), liquid crystal display, or touch screen, fordisplaying information to a computer user. An input device 314,including alphanumeric and other keys, is coupled to bus 302 forcommunicating information and command selections to processor 304.Another type of user input device is cursor control 316, such as amouse, a trackball, or cursor direction keys for communicating directioninformation and command selections to processor 304 and for controllingcursor movement on display 312. The input device typically has twodegrees of freedom in two axes, a first axis (for example, x) and asecond axis (for example, y), that allows the device to specifypositions in a plane. In some embodiments, the same directioninformation and command selections as cursor control can be implementedvia receiving touches on a touch screen without a cursor.

Computing system 300 can include a user interface module to implement agraphical user interface that can be stored in a mass storage device asexecutable software codes that are executed by the one or more computingdevices. This and other modules can include, by way of example,components, such as software components, object-oriented softwarecomponents, class components and task components, processes, functions,attributes, procedures, subroutines, segments of program code, drivers,firmware, microcode, circuitry, data, databases, data structures,tables, arrays, and variables.

In general, the word “module,” as used herein, refers to logic embodiedin hardware or firmware, or to a collection of software instructions,possibly having entry and exit points, written in a programminglanguage, such as, for example, Java, Lua, C or C++. A software modulecan be compiled and linked into an executable program, installed in adynamic link library, or written in an interpreted programming languagesuch as, for example, BASIC, Perl, or Python. It will be appreciatedthat software modules can be callable from other modules or fromthemselves, and/or can be invoked in response to detected events orinterrupts. Software modules configured for execution on computingdevices can be provided on a computer readable medium, such as a compactdisc, digital video disc, flash drive, magnetic disc, or any othertangible medium, or as a digital download (and can be originally storedin a compressed or installable format that requires installation,decompression, or decryption prior to execution). Such software code canbe stored, partially or fully, on a memory device of the executingcomputing device, for execution by the computing device. Softwareinstructions can be embedded in firmware, such as an EPROM. It will befurther appreciated that hardware modules can be comprised of connectedlogic units, such as gates and flip-flops, and/or can be comprised ofprogrammable units, such as programmable gate arrays or processors. Themodules or computing device functionality described herein arepreferably implemented as software modules, but can be represented inhardware or firmware. Generally, the modules described herein refer tological modules that can be combined with other modules or divided intosub-modules despite their physical organization or storage.

Computer system 300 can implement the techniques described herein usingcustomized hard-wired logic, one or more ASICs or FPGAs, firmware and/orprogram logic which in combination with the computer system causes orprograms computer system 300 to be a special-purpose machine. Accordingto some embodiments, the operations, functionalities, and techniques andother features described herein are performed by computer system 300 inresponse to processor 304 executing one or more sequences of one or moreinstructions contained in main memory 306. Such instructions can be readinto main memory 306 from another storage medium, such as storage device310. Execution of the sequences of instructions contained in main memory306 causes processor 304 to perform the process steps described herein.In alternative embodiments, hard-wired circuitry can be used in place ofor in combination with software instructions.

The term “non-transitory media” as used herein refers to anynon-transitory media storing data and/or instructions that cause amachine to operate in a specific fashion. Such non-transitory media cancomprise non-volatile media and/or volatile media. Non-volatile mediacan include, for example, optical or magnetic disks, such as storagedevice 310. Volatile media can include dynamic memory, such as mainmemory 306. Common forms of non-transitory media can include, forexample, a floppy disk, a flexible disk, hard disk, solid state drive,magnetic tape, or any other magnetic data storage medium, a CD-ROM, anyother optical data storage medium, any physical medium with patterns ofholes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memorychip or cartridge, and networked versions of the same.

Non-transitory media is distinct from, but can be used in conjunctionwith, transmission media. Transmission media can participate intransferring information between storage media. For example,transmission media can include coaxial cables, copper wire and fiberoptics, including the wires that comprise bus 302. Transmission mediacan also take the form of acoustic or light waves, such as thosegenerated during radio-wave and infra-red data communications.

Various forms of media can be involved in carrying one or more sequencesof one or more instructions to processor 304 for execution. For example,the instructions can initially be carried on a magnetic disk or solidstate drive of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computer system 300 canreceive the data on the telephone line and use an infra-red transmitterto convert the data to an infra-red signal. An infra-red detector canreceive the data carried in the infra-red signal and appropriatecircuitry can place the data on bus 302. Bus 302 carries the data tomain memory 306, from which processor 304 retrieves and executes theinstructions. The instructions received by main memory 306 canoptionally be stored on storage device 310 either before or afterexecution by processor 304.

Computer system 300 can also include a communication interface 318coupled to bus 302. Communication interface 318 can provide a two-waydata communication coupling to a network link 320 that can be connectedto a local network 322. For example, communication interface 318 can bean integrated services digital network (ISDN) card, cable modem,satellite modem, or a modem to provide a data communication connectionto a corresponding type of telephone line. As another example,communication interface 318 can be a local area network (LAN) card toprovide a data communication connection to a compatible LAN. Wirelesslinks can also be implemented. In any such implementation, communicationinterface 318 can send and receives electrical, electromagnetic oroptical signals that carry digital data streams representing varioustypes of information.

Network link 320 can typically provide data communication through one ormore networks to other data devices. For example, network link 320 canprovide a connection through local network 322 to a host computer 324 orto data equipment operated by an Internet Service Provider (ISP) 326.ISP 326 in turn can provide data communication services through theworld wide packet data communication network now commonly referred to asthe “Internet” 328. Local network 322 and Internet 328 can both useelectrical, electromagnetic or optical signals that carry digital datastreams. The signals through the various networks and the signals onnetwork link 320 and through communication interface 318, which carrythe digital data to and from computer system 300, can be example formsof transmission media.

Computer system 300 can send messages and receive data, includingprogram code, through the network(s), network link 320 and communicationinterface 318. In the Internet example, a server 330 can transmit arequested code for an application program through Internet 328, ISP 326,local network 322 and communication interface 318. The received code canbe executed by processor 304 as it is received, and/or stored in storagedevice 310, or other non-volatile storage for later execution. In someembodiments, server 330 can provide information for being displayed on adisplay.

FIG. 4 is a block diagram of an exemplary data structure 400, consistentwith embodiments of the present disclosure. Data structure 400 can storedata records associated with interactions involving multiple entities.Data structure 400 can be, for example, a database (e.g., database 170)that can store elements of an object model (e.g., object model 160). Insome embodiments, data structure 400 can be a Relational DatabaseManagement System (RDBMS) that stores interaction data as sections ofrows of data in relational tables. An RDBMS can be designed toefficiently return data for an entire row, or record, in as fewoperations as possible. An RDBMS can store data by serializing each rowof data of data structure 400. For example, in an RDBMS, data associatedwith interaction 1 of FIG. 4 can be stored serially such that dataassociated with all categories of interaction 1 can be accessed in oneoperation.

Alternatively, data structure 400 can be a column-oriented databasemanagement system that stores data as sections of columns of data ratherthan rows of data. This column-oriented DBMS can have advantages, forexample, for data warehouses, customer relationship management systems,and library card catalogs, and other ad hoc inquiry systems whereaggregates are computed over large numbers of similar data items. Acolumn-oriented DBMS can be more efficient than an RDBMS when anaggregate needs to be computed over many rows but only for a notablysmaller subset of all columns of data, because reading that smallersubset of data can be faster than reading all data. A column-orientedDBMS can be designed to efficiently return data for an entire column, inas few operations as possible. A column-oriented DBMS can store data byserializing each column of data of data structure 400. For example, in acolumn-oriented DBMS, data associated with a category (e.g., consumingentity identification category 420) can be stored serially such thatdata associated with that category for all interactions of datastructure 400 can be accessed in one operation.

As shown in FIG. 4 , data structure 400 can comprise data associatedwith a very large number of interactions associated with multipleentities. For example, data structure 400 can include 50 billion or moreinteractions. In some embodiments, interactions associated with multipleentities can be referred to as transactions between multiple entities.Where appropriate, the terms interactions and transactions are intendedto convey the same meaning and can be used interchangeably throughoutthis disclosure. While each interaction of data structure 400 isdepicted as a separate row in FIG. 4 , it will be understood that eachsuch interaction can be represented by a column or any other knowntechnique in the art. Each interaction data can include severalcategories of information. For example, the several categories caninclude, number category 410; consuming entity identification category420; consuming entity location category 430; provisioning entityidentification category 440; provisioning entity location category 450;type of provisioning entity category 460; interaction amount category470; and time of interaction category 480. It will be understood thatFIG. 4 is merely exemplary and that data structure 400 can include evenmore categories of information associated with an interaction.

Number category 410 can uniquely identify each interaction of datastructure 400. For example, data structure 400 depicts 50 billioninteractions as illustrated by number category 410 of the last row ofdata structure 400 as 50,000,000,000. In FIG. 4 , each row depicting aninteraction can be identified by an element number. For example,interaction number 1 can be identified by element 401; interactionnumber 2 can be identified by element 402; and so on such thatinteraction 50,000,000,000 can be identified by 499. It will beunderstood that this disclosure is not limited to any number ofinteractions and further that this disclosure can extend to a datastructure with more or fewer than 50 billion interactions. It is alsoappreciated that number category 410 need not exist in data structure400.

Consuming entity identification category 420 can identify a consumingentity. In some embodiments, consuming entity identification category420 can represent a name (e.g., User 1 for interaction 401; User N forinteraction 499) of the consuming entity. Alternatively, consumingentity identification category 420 can represent a code uniquelyidentifying the consuming entity (e.g., CE002 for interaction 402). Forexample, the identifiers under the consuming entity identificationcategory 420 can be a credit card number that can identify a person or afamily, a social security number that can identify a person, a phonenumber or a MAC address associated with a cell phone of a user orfamily, or any other identifier.

Consuming entity location category 430 can represent locationinformation of the consuming entity. In some embodiments, consumingentity location category 430 can represent the location information byproviding at least one of: a state of residence (e.g., statesub-category 432; California for element 401; unknown for interaction405) of the consuming entity; a city of residence (e.g., citysub-category 434; Palo Alto for interaction 401; unknown for interaction405) of the consuming entity; a zip code of residence (e.g., zip codesub-category 436; 94304 for interaction 401; unknown for interaction405) of the consuming entity; and a street address of residence (e.g.,street address sub-category 438; 123 Main St. for interaction 401;unknown for interaction 405) of the consuming entity.

Provisioning entity identification category 440 can identify aprovisioning entity (e.g., a merchant or a coffee shop). In someembodiments, provisioning entity identification category 440 canrepresent a name of the provisioning entity (e.g., Merchant 2 forinteraction 402). Alternatively, provisioning entity identificationcategory 440 can represent a code uniquely identifying the provisioningentity (e.g., PE001 for interaction 401). Provisioning entity locationcategory 450 can represent location information of the provisioningentity. In some embodiments, provisioning entity location category 450can represent the location information by providing at least one of: astate where the provisioning entity is located (e.g., state sub-category452; California for interaction 401; unknown for interaction 402); acity where the provisioning entity is located (e.g., city sub-category454; Palo Alto for interaction 401; unknown for interaction 402); a zipcode where the provisioning entity is located (e.g., zip codesub-category 456; 94304 for interaction 401; unknown for interaction402); and a street address where the provisioning entity is located(e.g., street address sub-category 458; 234 University Ave. forinteraction 401; unknown for interaction 402).

Type of provisioning entity category 460 can identify a type of theprovisioning entity involved in each interaction. In some embodiments,type of provisioning entity category 460 of the provisioning entity canbe identified by a category name customarily used in the industry (e.g.,Gas Station for interaction 401) or by an identification code that canidentify a type of the provisioning entity (e.g., TPE123 for interaction403). Alternatively, type of the provisioning entity category 460 caninclude a merchant category code (“MCC”) used by credit card companiesto identify any business that accepts one of their credit cards as aform of payment. For example, MCC can be a four-digit number assigned toa business by credit card companies (e.g., American Express™,MasterCard™, VISA™) when the business first starts accepting one oftheir credit cards as a form of payment.

In some embodiments, type of provisioning entity category 460 canfurther include a sub-category (not shown in FIG. 4 ), for example, typeof provisioning entity sub-category 461 that can further identify aparticular sub-category of provisioning entity. For example, aninteraction can comprise a type of provisioning entity category 460 as ahotel and type of provisioning entity sub-category 461 as either a bedand breakfast hotel or a transit hotel. It will be understood that theabove-described examples for type of provisioning entity category 460and type of provisioning entity sub-category 461 are non-limiting andthat data structure 400 can include other kinds of such categories andsub-categories associated with an interaction.

Interaction amount category 470 can represent a transaction amount(e.g., $74.56 for interaction 401) involved in each interaction. Time ofinteraction category 480 can represent a time at which the interactionwas executed. In some embodiments, time of interaction category 480 canbe represented by a date (e.g., date sub-category 482; Nov. 23, 2013,for interaction 401) and time of the day (e.g., time sub-category 484;10:32 AM local time for interaction 401). Time sub-category 484 can berepresented in either military time or some other format. Alternatively,time sub-category 484 can be represented with a local time zone ofeither provisioning entity location category 450 or consuming entitylocation category 430.

In some embodiments, each interaction data can include categories ofinformation including (not shown in FIG. 4 ), for example, consumingentity loyalty membership category, consuming entity credit card typecategory, consuming entity age category, consuming entity gendercategory, consuming entity income category, consuming entity withchildren category, product information category, and service informationcategory.

Consuming entity loyalty membership category can represent whether theconsuming entity is part of a loyalty membership program associated witha provisioning entity. For example, consuming entity loyalty membershipcategory can represent that the consuming entity is a member of one ofCostco™ membership programs including Goldstar Member™, ExecutiveMember™, and Business Member™. Consuming entity credit card typecategory can represent the type of credit card used by the consumingentity for a particular interaction. For example, consuming entitycredit card type category can indicate that the credit card used by theconsuming entity for that particular interaction can be an AmericanExpress™, MasterCard™, VISA™, or Discover™ card. In some embodiments,consuming entity credit card type category can represent a kind ofMasterCard™ (e.g., Gold MasterCard™ or Platinum MasterCard™) used for aparticular interaction.

In some embodiments, consuming entity demographic information can bestored in each interaction. For example, consuming entity demographicinformation can include at least one of: consuming entity age category,consuming entity gender category, consuming entity income category, andconsuming entity with children category. In some embodiments, consumingentity age category can represent age information associated with theconsuming entity; consuming entity gender category can represent genderinformation (e.g., Male or Female) associated with the consuming entity;consuming entity income category can represent income information (e.g.,greater than $100,000 per year) associated with the consuming entity;and consuming entity with children category can represent whether theconsuming entity has any children under 18 or not. For example, if theconsuming entity has children under 18, a positive indication can bestored and if the consuming entity does not have children under 18, anegative indication can be stored. In some embodiments, consuming entitywith children category can store information representing a number ofchildren associated with the consuming entity.

Product information category can represent information associated with aproduct that is involved in an interaction. For example, productinformation category can represent that the product involved in theinteraction is a particular type of product based on a stock keepingunit (“SKU”) of the product. In some embodiments, the product's SKU canbe unique to a particular provisioning entity involved in thatparticular interaction. Alternatively, product information category canrepresent the product involved in the interaction with a at least one ofa Universal Product Code, International Article Number, Global TradeItem Number, and Australian Product Number. Service information categorycan represent information associated with a service that is involved inan interaction. For example, service information category can representthat the service involved in the interaction is a particular type ofservice based on an SKU of the service. It will be appreciated that anSKU can uniquely represent either a product or a service. Some examplesof services can be warranties, delivery fees, installation fees, andlicenses.

FIG. 5 is a block diagram of an exemplary data structure 500, consistentwith embodiments of the present disclosure. Data structure 500 can storedata records associated with interactions involving multiple entities,similarly to data structure 400. As with data structure 400, datastructure 500 can be, for example, a database (e.g., database 170) thatcan store elements of an object model (e.g., object model 160). In someembodiments, data structure 500 can be an RDBMS that stores interactiondata as sections of rows of data in relational tables. An RDBMS can bedesigned to efficiently return data for an entire row, or record, in asfew operations as possible. An RDBMS can store data by serializing eachrow of data of data structure 500. For example, in an RDBMS, dataassociated with interaction 1 of FIG. 5 can be stored serially such thatdata associated with all categories of interaction 1 can be accessed inone operation. Alternatively, as with data structure 400, data structure500 can be a column-oriented database management system that stores dataas sections of columns of data rather than rows of data.

As shown in FIG. 5 , data structure 500 can comprise data associatedwith a very large number of interactions associated with multipleentities. For example, data structure 500 can include 50 billion or moreinteractions. In some embodiments, interactions associated with multipleentities can be referred to as transactions between multiple entities.As with data structure 400, while each interaction of data structure 500is depicted as a separate row in FIG. 5 , it will be understood thateach such interaction can be represented by a column or any other knowntechnique in the art. Each interaction data can include severalcategories of information. For example, the several categories caninclude number category 510; identifier category 520; location category530; and time of interaction category 540. It will be understood thatFIG. 5 is merely exemplary and that data structure 500 can include evenmore categories of information associated with an interaction.

Number category 510 can uniquely identify each interaction of datastructure 500. For example, data structure 500 depicts 50 billioninteractions as illustrated by number category 510 of the last row ofdata structure 500 as 50,000,000,000. In FIG. 5 each row depicting aninteraction can be identified by an element number. For example,interaction number 1 can be identified by element 501; interactionnumber 2 can be identified by element 502; and so on such thatinteraction 50,000,000,000 can be identified by 599. It will beunderstood that this disclosure is not limited to any number ofinteractions and further that this disclosure can extend to a datastructure with more or fewer than 50 billion interactions. It is alsoappreciated that number category 510 need not exist in data structure500.

Identifier category 520 can identify a consuming entity. In someembodiments consuming identifier category 520 can represent a name orcode uniquely identifying a consuming entity. For example, identifiercategory can represent a unique identifier such as an Identifier forAdvertisers (“IDFA”), Globally Unique Identifier (“GUID”), UniversallyUnique Identifier (“UUID”), Media Access Control (MAC) address, or someother unique identifier. These identifiers can be stored as hexadecimalstrings (e.g., ABCD5 . . . 567 for interaction 501; DCBA1 . . . 955 forInteraction 599). In some embodiments, the identifiers under theconsuming entity identifier category 520 can be a credit card numberthat can identify a person or a family, a social security number thatcan identify a person, a phone number or a MAC address associated with acell phone of a user or family, or any other identifier. Althoughconsumer identifier category 520 comprises unique types of data,interactions involving common entities can share an identifier. In thisway, data structure 500 can store multiple discrete interactionscorresponding to a specific consumer identifier.

Consuming entity location category 530 can represent locationinformation of the consuming entity. In some embodiments, consumingentity location category 530 can represent the location information byproviding at least one of: a state of residence (e.g., statesub-category 532; California for element 501; unknown for interaction505) of the consuming entity; a city of residence (e.g., citysub-category 534; Palo Alto for interaction 501; unknown for interaction505) of the consuming entity; a zip code of residence (e.g., zip codesub-category 536; 94304 for interaction 501; unknown for interaction505) of the consuming entity; a street address of the interaction (e.g.,street address sub-category 538; 234 University Avenue for interaction501; unknown for interaction 505); a latitude of the interaction (e.g.,lat sub-category 537; 37.4292 for interaction 501; unknown forinteraction 505); a longitude of the interaction (e.g., Ing sub-category539; 122.1381 for interaction 501; unknown for interaction 505) of theconsuming entity.

In some embodiments, the data in location category 530 may be inferredfrom other data instead of being directly provided. For example, aconsumer entity can provide her location on social media platforms suchas Twitter, FourSquare, LinkedIn, Facebook, or other similar platforms.Location information posted on these social media platforms can beimprecise. For example, a tweet posted on Twitter may provide only thename of a restaurant. In another example, a tweet may include genericlocation information such as “Near Meatpacking,” In some embodiments thelocation information provided can be used directly without furtherprocessing or analysis. In other embodiments, more specific locationinformation for the named restaurant can be obtained by comparing therestaurant to data systems containing location information for knownrestaurants (e.g., geographic data systems 230). If multiple restaurantshave the same name, or if a restaurant has multiple locations,additional social media platform posts from the same consumer entity canhelp identify the specific location by providing a grouping of locationsoccurring within a short time period or locations that are frequented.The embodiments described can determine the necessary level ofaggregation for the location information and the detail that iscalculated and/or stored can depend on the specific application.

In some embodiments, time of interaction category 540 can be representedby a date (e.g., date sub-category 542; Nov. 23, 2013, for interaction501) and time of the day (e.g., time sub-category 544; 10:32 AM localtime for interaction 501). Time sub-category 584 can be represented ineither military time or some other format. Alternatively, timesub-category 584 can be represented with a local time zone of consumingentity location category 530. Similarly to data for location category530, time category 540 can be inferred from other available datasources. For example, a tweet from a specific location may be presentedon Twitter as occurring a specific number of minutes or hours ago. Fromthis information, an approximate time and/or date can be inferred andentered into date sub-category 542 and time sub-category 544. Similarlyto the location information described above, in some embodiments thetime of interaction category can be used directly as provided. Forexample, a tweet may indicate that it was posted “One Week Ago.” In someembodiments, no additional resolution of the specific time is computedand the time information, as provided, can be used for date sub-category542 and time sub-category 544. As with location information, theembodiments described can determine the appropriate level of detail fortime sub-category 542 and date sub-category 544 based on the specificapplication and can adjust the level of detail to meet specific needs.

Data structure 500 can represent multiple types of information. In oneembodiment, data structure 500 can represent online advertisinginformation. In some embodiments, data structure 500 can further includea product information category to identify an advertised product. Insome embodiments data structure 500 may represent types of data such asmobile device location information, social media interactions, orconsumer transaction information. Data structure 500 can includeadditional categories and sub-categories (not picture in FIG. 5 )specific to the various types of data that data structure 500 canrepresent.

Data structures 400 and 500, as shown in FIGS. 4 and 5 can exist in oneor more of financial services systems 220, geographic data systems 230,provisioning entity management systems 240, or consuming entity datasystems 250. These data structures can be made available to analysisengine 210 for processing.

FIG. 6 is a block diagram of system 600, consistent with embodiments ofthe present disclosure for attributing data from disparate data sets toa common entity. System 600 can include provisioning entity managementsystem 640, financial services system 620, and analysis engine 610 whichcan be embodiments of provisioning entity management systems 240,financial services systems 220, and analysis engine 210 respectively.Provisioning entity management system 640 and financial services system620 can provide disparate data sets to analysis engine 610 forattribution. It is appreciated that provisioning entity managementsystem 640 and financial services system 620 can each provide multipledata sets. Moreover, system 600 can include additional sources of data.

Provisioning entity management system 640 can represent an onlineadvertising system. Web pages 641A-C can include online advertisements.The advertisements can be displayed through a desktop web browser suchas Google Chrome, Apple Safari, or Microsoft's Internet Explorer, or theadvertisements may be displayed through a web browser on a mobile deviceor tablet. The advertisements that are displayed may provide analyticdata to the provisioning entity management system 640 responsible forproviding the advertisements. In some embodiments, the advertisement canmake use of a system such as IDFA to track the ad placement. Informationabout the ad placement can be stored in database 643. Provisioningentity management system 640 can store database 643 in a memory such asmain memory 306 or storage device 310 shown in FIG. 3 . The data storedin database 643 can be represented by data structure 500 as shown inFIG. 5 . Data stored in database 643 can include data describingmultiple advertisements shown to the same consumer entity represented bya single IDFA.

Moreover, data structure 500 can include more than one type of data.Provisioning entity management system 640 can include location dataalong with the advertisement data (e.g., location category 530 of datastructure 500). In these embodiments, the IDFA represented in identifiercategory 520 can also represent a unique mobile device. Locationcategory 530 and time category 540 of data structure 500 can furtherindicate the location of the consuming entity at the time theadvertisement was viewed. Accordingly, all of this information can beprovided to analysis engine 610.

Financial services system 620 can represent an interaction system forprocessing and storing transactions. Point of Sale (“POS”) terminals623A-C can accept and process credit cards 621A-B. In some embodiments,a single credit card (e.g., credit card 621A) can be used at multiplePOS systems (e.g., POS 623A and POS 623B. Data related to transactionsprocessed by POS 623A-C can be stored in database 625. Financialservices system 620 can store database 625 in a memory such as mainmemory 306 or storage device 310. The data stored in database 643 can berepresented by data structure 400 as shown in FIG. 4 .

Analysis engine 610 can analyze data provided by Provisioning EntityManagement System 640 and Financial Services System 620. As previouslydescribed, analysis engine 610 can implemented using a data fusionsystem such as data fusion system 100. Analysis engine 610 can includetranslation system 611. Translation system 611 can process the dataprovided by Financial Services System 620 and Provisioning EntityManagement System 640 according to the above description of data fusionsystem 100. Translation system 611 can store the processed data as aconsistent object model 160. Translation system 611 can provide theprocessed data to data processing system 613.

Data processing system 613 can analyze the disparate data sets providedby provisioning entity management system 640 and financial servicessystem 620 through translation system 611. Data processing system 613can further process the data from each individual data set (e.g., thedata set provided by provisioning entity management system 640 andfinancial services system 620 respectively) to determine overlappingpatterns in each data set that may indicate that data in the multipledata sets refer to the same user, individual, or consuming entity.

In some embodiments, data processing system 613 can directly map onedata set onto another data set. For example if both the data set fromprovisioning entity management system 640 and from financial servicessystem 620 contain credit card information, data processing system 613can explicitly attribute consumer entities in one data set with consumerentities in the other data set by attributing rows based on the uniquecredit card number. In this example, data processing system 613 mayupdate either data set based on the attributed information in the otherdata set.

In some embodiments, explicit attribution is unavailable and othermethods of attribution, such as trajectories, can be used. Trajectoriescan represent data in a data set that refers to the same entity. Forexample, as shown in FIG. 4 , entries 401 and 403 both refer to “User 1”in the consuming entity identification category 420. Similarly, entries404 and 405 can refer to “User 3” This can indicate that entries 401 and403 are two transactions for the same consumer. As previously stated, insome embodiments, the consuming entity identification category can be acredit card number or identifier. Using the information in entries 401and 403, for example, data processing system 613 can create a trajectorythat represents the user and includes the location and time of eachentry. In some embodiments, more information can be included in thetrajectory, such as the interaction amount or type of provisioningentity.

As shown in FIG. 5 , entries 501 and 505 contain the same identifiercategory value. This value can, for example, represent a mobile deviceidentifier. Data structure 500 of FIG. 5 can represent data provided todata processor 613 in FIG. 6 . As was done with entries 401 and 403 ofdata structure 400 of FIG. 4 , data processing system 613 can create atrajectory using the information that refers to entries 501 and 505.This trajectory can include the location and time categories and canrepresent locations where the mobile device was located.

After data processor 613 has calculated trajectories for the provideddata (e.g., data structure 400 of FIG. 4 and data structure 500 of FIG.5 ), data processor 613 can compare all of the calculated trajectoriesto search for agreement among the data sets. Agreement can refer totrajectories that contain similar or identical information. For example,the trajectory for entries 401 and 403 of FIG. 4 and the trajectory for501 and 505 of FIG. 5 will both contain a reference to location “234University Avenue” at 10:32. Further, in this example, the trajectoryfor entries 401 and 403 can contain a reference to a transaction on date2013/11/21 at 19:00 and the trajectory for entries 501 and 505 cancontain a reference to a location on 2013/11/21 at 19:00. The agreementbetween references of the two trajectories can indicate that “User 1” ofFIG. 4 is attributable to the locations for identifier “ABCD5 . . . 567”of FIG. 5 .

After determining trajectories that agree among the provided data sets,data processor can attribute data in one data set with data in the otherdata set according to the overlapping trajectories. This attribution canthen allow for data entries in both data sets that were not part of theoverlapping trajectories to be associated with each other because theyshare a common unique identifier with data that has been attributed.This attribution can also allow for completion of the data sets. Forexample, following attribution of “User 1” to identifier “ABCD5 . . .567” in the previous example, entry 403 of FIG. 4 can be updated toinclude the location indicated in entry 505 of FIG. 5 .

The level of agreement achievable among data sets can vary greatly withthe nature of the data. The general uniqueness of entries in thedatabase can be referred to as the data sets unicity. A high unicityindicates that there is little overlap of data across multipleindividual entities in a data set. In some embodiments, data sets havinghigh levels of unicity can provide sufficient levels of agreement withless overlap among trajectories. Data sets having lower levels ofunicity may require more agreement among the data set trajectoriesbefore data processing system 613 can affirmatively determine that twotrajectories refer to the same entity. After calculation of thetrajectory matches, data processing 613 can analyze the resultingagreement and determine how many trajectories in each data set agreewith multiple different identities in the other data sets. If atrajectory in the first data set agrees with multiple trajectories inthe second data set, data processing 613 can determine that theagreement is not reliable for affirmatively matching specific entitiesin each data set.

Data processing system 613 can adjust different variables to affect theconfidence in the resulting agreements. For example, data processingsystem 613 can set a threshold value of the number of referencesincluded in each trajectory. Increasing the threshold value can providehigher levels of confidence in the resulting agreement, but may providea lower number of attributable entities. This may be necessary for datasets having a low level of unicity. Conversely, decreasing the thresholdnumber can provide a higher number of attributable entities, but mayreduce the confidence that the agreement among trajectories uniquelycorrelates entities in both data sets.

Moreover, aspects of the underlying data used in the trajectories canprovide additional mechanisms to control the trajectory comparisons. Forexample, instead of data processing system 613 requiring specificlocation matches, data processing system can use generic or impreciselocations in the trajectory calculations. For example, the location canbe specified as, among other things, a city block, a city, a town, ageneral area, a point described in relation to distance or proximity toanother place, or a radial location. A location can include a radius ofa certain threshold distance around the location or area specified inthe data record. In this example, a second location within a certainradial distance from the first location can be considered a locationmatch.

In some embodiments, trajectories based on time and dates can include atime or date span instead of a specific date and time. In theseembodiments, data processing system 613 can adjust for records thatrefer to the same entity but include differences in the data sets thatare attributable to factors such as, for example, the methods ofrecording or data collection. Moreover, trajectories based on time anddates may, similarly to location data, use imprecise or generic data aspart of the trajectory calculations. For example, time information suchas “around 9 AM” that may or may not also include day information can beused when calculating trajectories. In some embodiments, time or dateranges can contribute to the trajectory calculations. As previouslydiscussed, data of varying levels of specificity can be included in thetrajectory calculations.

In some embodiments data processing system 613 can further analyze thetrajectories and data sets to determine the reliability of the resultingattributions. Data processing system 613 can perform a probabilisticanalysis of records in the data sets to determine if the number ofattributable matches resulting from the trajectory analysis issignificantly more than would occur from a random sampling. Thisprobabilistic analysis can provide a relative confidence in thetrajectory analysis for a particular set of data sets. Based on theanalysis, changes in the underlying attributes of the analysis can alterthe accuracy of the attribution.

After data processing system 613 makes an attribution of entitiesrepresented in each data set, future updates in either data set can alsocorrelated based on the original attribution. Data processing system 613can store the attribution information in database 615 for futureanalysis and updates.

FIG. 7 is a flowchart representing an exemplary attribution process 700for resolving data attributed to a single entity consistent withembodiments of the present disclosure. It will be readily appreciatedthat the illustrated procedure can be altered to delete steps or furtherinclude additional steps. Flowchart 700 starts at step 710. Attributionprocess 700 can obtain a first data set (step 720) and obtain a seconddata set (step 730). Each data set can consist of multiple records ofdata and each record of data can be associated with a particular entity.In some embodiments, each data set can contain multiple recordsassociated with a single entity. Moreover, records in each data set canbe associated with the same entity. For example, the first data set canrepresent location information related to mobile devices. Multiplerecords in the data set can represent the location of a specific deviceat multiple points in time. Further, for example, the second data setcan represent financial transactions including credit card information.Similarly to the first data set, in this example, multiple records canrefer to separate transactions using the same credit card.

After obtaining the data sets, attribution process 700 can determine atleast one trajectory (step 740) based on records in the first data setand can determine at least one trajectory (step 750) based on records inthe second data set. Each trajectory can be representative of records inits respective data set that are associated with the same entity. Thetrajectories can represent specific elements of multiple records in thedata set. For example, a trajectory can represent multiple locationsassociated with the same entity or multiple time entries associated withthe same entity. Further, in some embodiments, the trajectories canrepresent the locations, dates, times, amounts, or other details aboutmultiple transaction records for a single credit card. Moreover, thevalues represented in the trajectories can vary depending on thespecific data sources and applications. For example, location basedtrajectories can include radial areas instead of only including specificcoordinates. As another example, dates and times in a trajectory caninclude a date and/or time range instead of only a specific date andtime.

After determining trajectories for both data sets, attribution process700 can compare (step 760) the trajectories across the data sets aspreviously described in reference to FIG. 6 . Attribution process 700can associate (step 770) trajectories across the data sets that sharesimilar elements. For example a trajectory for the first data set thatreferences multiple locations may be associated with a trajectory forthe second data set that references the same locations in the sameorder. Attribution process 700 can compare trajectories much faster thanother comparison systems that consider the entirety of each of the datasets and their records. The improved performance can allow comparisonsof large data sets through their trajectories much more efficiently thanpreviously possible.

After associations are made, attribution process 700 can resolve (step780) an entity represented by records in the first data set with anentity represented by records in the second data set based on theassociations made in step 770. In this way, attribution process 700attributes data records in multiple sets of data to the same underlyingentity. In doing so, consumers of the data sets can make better use ofinformation that was previously unknown and unobtainable.

Embodiments of the present disclosure have been described herein withreference to numerous specific details that can vary from implementationto implementation. Certain adaptations and modifications of thedescribed embodiments can be made. Other embodiments can be apparent tothose skilled in the art from consideration of the specification andpractice of the embodiments disclosed herein. It is intended that thespecification and examples be considered as exemplary only, with a truescope and spirit of the present disclosure being indicated by thefollowing claims. It is also intended that the sequence of steps shownin figures are only for illustrative purposes and are not intended to belimited to any particular sequence of steps. As such, it is appreciatedthat these steps can be performed in a different order whileimplementing the exemplary methods or processes disclosed herein

What is claimed is:
 1. A method for attributing data to entities usingdisparate data sets, the method comprising: receiving a set of firstrecords associated with an entity; determining a first trajectoryassociated with the entity based on the set of first records, the firsttrajectory indicating a behavior associated with the entity; identifyinga second trajectory among a plurality of second records in a datarepository, the second trajectory comprising a set of matching secondrecords that indicate the behavior associated with the entity; andattributing the set of matching second records to the entity associatedwith the set of first records; wherein the method is performed using oneor more processors.
 2. The method of claim 1, wherein the attributingthe set of matching second records to the entity further comprises:determining that a number of records in the set of matching secondrecords is higher than a threshold value; and attributing the set ofmatching second records to the entity associated with the set of firstrecords in response to the determining that the number of records amongthe set of matching second records is higher than the threshold value.3. The method of claim 1 wherein the set of first records include anidentifier that identifies the entity.
 4. The method of claim 3, whereinthe attributing the set of matching second records to the entityassociated with the set of first records includes: assigning theidentifier that identifies the entity to the set of matching secondrecords.
 5. The method of claim 1 wherein the set of first recordsinclude at least one selected from a group consisting of transactiondata, social network data, consumer data, provisioning data, and productdata.
 6. The method of claim 1, further comprising: accessing an objectmodel associated with the plurality of second records; and transformingthe set of first records based on the object model; wherein thedetermining a first trajectory comprises determining the firsttrajectory associated with the entity based on the set of transformedfirst records.
 7. The method of claim 1, wherein the identifying asecond trajectory includes: determining a first unicity of the set offirst records; determining a second unicity of the set of matchingsecond records; and performing a comparison of the first unicity and thesecond unicity.
 8. A system for attributing data to entities usingdisparate data sets, the system comprising: one or more memoriescomprising instructions stored thereon; and one or more processorsconfigured to execute the instructions and perform operationscomprising: receiving a set of first records associated with an entity;determining a first trajectory associated with the entity based on theset of first records, the first trajectory indicating a behaviorassociated with the entity; identifying a second trajectory among aplurality of second records in a data repository, the second trajectorycomprising a set of matching second records that indicate the behaviorassociated with the entity; and attributing the set of matching secondrecords to the entity associated with the set of first records.
 9. Thesystem of claim 8, wherein the attributing the set of matching secondrecords to the entity further comprises: determining that a number ofrecords in the set of matching second records is higher than a thresholdvalue; and attributing the set of matching second records to the entityassociated with the set of first records in response to the determiningthat the number of records among the set of matching second records ishigher than the threshold value.
 10. The system of claim 8, wherein theset of first records include an identifier that identifies the entity.11. The method of claim 10, wherein the attributing the set of matchingsecond records to the entity associated with the set of first recordsincludes: assigning the identifier that identifies the entity to the setof matching second records.
 12. The system of claim 8, wherein the setof first records include at least one selected from a group consistingof transaction data, social network data, consumer data, provisioningdata, and product data.
 13. The system of claim 8, wherein theoperations further comprise: accessing an object model associated withthe plurality of second records; and transforming the set of firstrecords based on the object model; wherein the determining a firsttrajectory comprises determining the first trajectory associated withthe entity based on the set of transformed first records.
 14. The systemof claim 8, wherein the identifying a second trajectory includes:determining a first unicity of the set of first records; determining asecond unicity of the matching set of first records; and performing acomparison of the first unicity and the second unicity.
 15. Anon-transitory computer-readable storage medium storing instructionsthat, when executed by one or more processors, cause the one or moreprocessors to perform operations comprising: receiving a set of firstrecords associated with an entity; determining a first trajectoryassociated with the entity based on the set of first records, the firsttrajectory indicating a behavior associated with the entity; identifyinga second trajectory among a plurality of second records in a datarepository, the second trajectory comprising a set of matching secondrecords that indicate the behavior associated with the entity; andattributing the set of matching second records to the entity associatedwith the set of first records.
 16. The non-transitory machine-readablestorage medium of claim 15, wherein the attributing the set of matchingsecond records to the entity further comprises: determining that anumber of records in the set of matching second records is higher than athreshold value; and attributing the set of matching second records tothe entity associated with the set of first records in response to thedetermining that the number of records among the set of matching secondrecords is higher than the threshold value.
 17. The non-transitorymachine-readable storage medium of claim 15, wherein the set of firstrecords include an identifier that identifies the entity.
 18. Thenon-transitory machine-readable storage medium of claim 15, wherein theset of first records include at least one selected from a groupconsisting of transaction data, social network data, consumer data,provisioning data, and product data.
 19. The non-transitorymachine-readable storage medium of claim 15, wherein the operationsfurther comprise: accessing an object model associated with theplurality of second records; and transforming the set of first recordsbased on the object model; wherein the determining a first trajectorycomprises determining the first trajectory associated with the entitybased on the set of transformed first records.
 20. The non-transitorymachine-readable storage medium of claim 15, wherein the identifying theset of matching second records includes: determining a first unicity ofthe set of first records; determining a second unicity of the matchingset of first records; and performing a comparison of the first unicityand the second unicity.